Incorporating Knowledge of Source Language Text in a System for Dictation of Document Translations

نویسندگان

  • Aarthi Reddy
  • Richard Rose
  • Hani Safadi
  • Samuel Larkin
  • Gilles Boulianne
چکیده

This paper describes methods for integrating source language and target language information for machine aided human translation (MAHT) of text documents. These methods are applied to a language translation task involving a human translator dictating a first draft translation of a source language document. A method is presented which integrates target language automatic speech recognition (ASR) models with source language statistical machine translation (SMT) and named entity recognition (NER) information at the phonetic level. Information extracted from a source language document including translation model probabilities and translated named entities are combined with acoustic-phonetic information obtained from phone lattices produced by the ASR system. Phone-level integration allows the combined MAHT system to correctly decode words that are either not in the ASR vocabulary or would have been incorrectly decoded by the ASR system. It is shown that the combined MAHT system results in a decrease in word error rate on the dictated translations of 32% relative to a stand alone baseline ASR system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Loss of the Socio-cultural Implicit Meanings in the English Translations of Mu’allaqat

Abstract Translation of literary texts, especially poetry, is one of the most difficult tasks; it requires mastery and knowledge of the language system and culture, and lack of this might lead to wrong translation. This study aimed to examine the loss and gain of the sociocultural implicit meanings in the English translations of the Mu’allaqat, and assess whether the translators of the Mu’allaq...

متن کامل

Loss of the Socio-cultural Implicit Meanings in the English Translations of Mu’allaqat

Abstract Translation of literary texts, especially poetry, is one of the most difficult tasks; it requires mastery and knowledge of the language system and culture, and lack of this might lead to wrong translation. This study aimed to examine the loss and gain of the sociocultural implicit meanings in the English translations of the Mu’allaqat, and assess whether the translators of the Mu’allaq...

متن کامل

Integration of ASR and machine translation models in a document translation task

This paper is concerned with the problem of machine aided human language translation. It addresses a translation scenario where a human translator dictates the spoken language translation of a source language text into an automatic speech dictation system. The source language text in this scenario is also presented to a statistical machine translation system (SMT). The techniques presented in t...

متن کامل

Transmission of Ideology through Translation: A Critical Discourse Analysis of Chomsky’s “Media Control” and its Persian Translations

Among factors that might manipulate translators’ mind while producing a text is the notion of ideology transmission through text or talk. Adopting Critical Discourse Analysis (CDA) with particular emphasis on the framework of Van Dijk (1999), the present investigation is an attempt to shed light on the relationship between language and ideology involved in translation in general, and more speci...

متن کامل

Automatic text dictation in computer-assisted translation

In this paper, we study the incorporation of statistical machine translation models to automatic speech recognition models in the framework of computer-assisted translation. The system is given a source language text to be translated and it shows the source text to the human translator to translate it orally. The system captures the user speech which is the dictation of the target language sent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009